Toward a Gold Standard for Extractive Text Summarization
نویسندگان
چکیده
Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection. The question arises just how good extractive summarization can ever be. Without generating language to express the gist of a text – its abstract – can we expect to make summaries which are both readable and informative? In search for an answer, we employed a corpus partially labelled with Summary Content Units: snippets which convey the main ideas in the document collection. Starting from this corpus, we created SCU-optimal summaries for extractive summarization. We support the claim of optimality with a series of experiments.
منابع مشابه
Text Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملBiogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization
Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...
متن کاملGold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data
Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popul...
متن کاملUsing Speech-Specific Characteristics for Automatic Speech Summarization
In this thesis we address the challenge of automatically summarizing spontaneous, multi-party spoken dialogues. The experimental hypothesis is that it is advantageous when summarizing such meeting speech to exploit a variety of speech-specific characteristics, rather than simply treating the task as text summarization with a noisy transcript. We begin by investigating which term-weighting metri...
متن کاملA Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization. Specifically, we utilized documents of multi-participant chat from a well known online instant messaging application, WhatsApp. We construct the gold standard by asking three native speakers to manually summarize 300 chat sections (152 of them contain images). As result, three reference sum...
متن کامل